Human reliability is related to the field of human factors engineering and ergonomics, and refers to the reliability of humans in fields such as manufacturing, transportation, the military, or medicine. Human performance can be affected by many factors such as age, state of mind, physical health, attitude, emotions, propensity for certain common mistakes, errors and cognitive biases, etc.
Human reliability is very important due to the contributions of humans to the resilience of systems and to possible adverse consequences of human errors or oversights, especially when the human is a crucial part of the large socio-technical systems as is common today. User-centered design and error-tolerant design are just two of many terms used to describe efforts to make technology better suited to operation by humans.
Contents |
A variety of methods exist for human reliability analysis (HRA).[1][2] Two general classes of methods are those based on probabilistic risk assessment (PRA) and those based on a cognitive theory of control.
One method for analyzing human reliability is a straightforward extension of probabilistic risk assessment (PRA): in the same way that equipment can fail in a plant, so can a human operator commit errors. In both cases, an analysis (functional decomposition for equipment and task analysis for humans) would articulate a level of detail for which failure or error probabilities can be assigned. This basic idea is behind the Technique for Human Error Rate Prediction (THERP).[3] THERP is intended to generate human error probabilities that would be incorporated into a PRA. The Accident Sequence Evaluation Program (ASEP) human reliability procedure is a simplified form of THERP; an associated computational tool is Simplified Human Error Analysis Code (SHEAN).[4] More recently, the US Nuclear Regulatory Commission has published the Standardized Plant Analysis Risk (SPAR) human reliability analysis method also because of human error.[5][6]
Erik Hollnagel has developed this line of thought in his work on the Contextual Control Model (COCOM) [7] and the Cognitive Reliability and Error Analysis Method (CREAM).[8] COCOM models human performance as a set of control modes—strategic (based on long-term planning), tactical (based on procedures), opportunistic (based on present context), and scrambled (random) - and proposes a model of how transitions between these control modes occur. This model of control mode transition consists of a number of factors, including the human operator's estimate of the outcome of the action (success or failure), the time remaining to accomplish the action (adequate or inadequate), and the number of simultaneous goals of the human operator at that time. CREAM is a human reliability analysis method that is based on COCOM.
Related techniques in safety engineering and reliability engineering include failure mode and effects analysis, hazop, fault tree, and SAPHIRE: Systems Analysis Programs for Hands-on Integrated Reliability Evaluations.
Since the end of WWII human factors issues have become a paramount concern in aviation safety. It’s estimated that anywhere between 90% to 95% of aviation accidents and incidents are caused by human factors. But exactly what is considered a human factor? Human factors is an all encompassing effort to compile data about human capabilities and limitations and apply that data to equipment, systems, software, facilities, procedures, jobs, environments, training, staffing, and personnel management to produce safe comfortable, ergonomic and effective human performance. The FAA is currently making an effort to integrate human factors into all aspects of aviation where safety is a major concern. As a result the FAA issued FAA order 9550.8 which is a human factors policy that states the following:
Human factors shall be systematically integrated into the planning and execution of the functions of all FAA elements and activities associated with system acquisitions and system operations. FAA endeavors shall emphasize human factors considerations to enhance system performance and capitalize upon the relative strengths of people and machines. These considerations shall be integrated at the earliest phases of FAA projects.
The FAA has realized that when most individuals think of a system or project, they usually consider only the tangibles such as hardware, software and equipment. Most individuals fail to think about the end user of the product, the human being. Therefore during systems designing consideration for different aptitudes and abilities are never considered. The FAA’s combating this predominant thought pattern by the introduction of what is known as “Total System Performance”. Total System Performance is a measure of probability. The probability that the total system will perform correctly, when it is available, is the probability that the hardware/software will perform correctly, times the probability that the operating environment will not degrade the system operation, times the probability that the user will perform will perform correctly. It’s been discovered that a system can work perfectly in a test environment, demonstration site or laboratory and then not perform as well once the human being enters the loop as the operator. In order to compensate for this fact, human factors must be accounted for and integrated into new systems. By doing so there will be increased performance accuracy, decreased performance time and enhanced safety. FAA research has indicated that designing systems to improve human performance is cost effective and safe when done early in the developmental stages of a project. Some potential human factors to consider during research and development stages are functional design, safety and health, work space, display and controls, information requirements, display presentation, visual/aural alerts, communications, anthropometrics and environment.
Human error has been cited as a cause or contributing factor in disasters and accidents in industries as diverse as nuclear power (e.g., Three Mile Island accident), aviation (see pilot error), space exploration (e.g., Space Shuttle Challenger Disaster), and medicine (see medical error). It is also important to stress that "human error" mechanisms are the same as "human performance" mechanisms; performance later categorized as 'error' is done so in hindsight:[9][10] therefore actions later termed "human error" are actually part of the ordinary spectrum of human behaviour. The study of absent-mindedness in everyday life provides ample documentation and categorization of such aspects of behavior. While human error is firmly entrenched in the classical approaches to accident investigation and risk assessment, it has no role in newer approaches such as resilience engineering.[11]
There are many ways to categorize human error.[12][13]
The cognitive study of human error is a very active research field, including work related to limits of memory and attention and also to decision making strategies such as the availability heuristic and other cognitive biases. Such heuristics and biases are strategies that are useful and often correct, but can lead to systematic patterns of error.
Misunderstandings as a topic in human communication have been studied in conversation analysis, such as the examination of violations of the cooperative principle and Gricean maxims.
Organizational studies of error or dysfunction have included studies of safety culture. One technique for organizational analysis is the Management Oversight Risk Tree (MORT).[19][20]
The Human Factors Analysis and Classification System (HFACS) was developed initially as a framework to understand "human error" as a cause of aviation accidents.[21][22] It is based on James Reason's Swiss cheese model of human error in complex systems. HFACS distinguishes between the "active failures" of unsafe acts, and "latent failures" of preconditions for unsafe acts, unsafe supervision, and organizational influences. These categories were developed empirically on the basis of many aviation accident reports.
"Unsafe acts" are performed by the human operator "on the front line" (e.g., the pilot, the air traffic controller, the driver). Unsafe acts can be either errors (in perception, decision making or skill-based performance) or violations (routine or exceptional). The errors here are similar to the above discussion. Violations are the deliberate disregard for rules and procedures. As the name implies, routine violations are those that occur habitually and are usually tolerated by the organization or authority. Exceptional violations are unusual and often extreme. For example, driving 60 mph in a 55-mph zone speed limit is a routine violation, but driving 130 mph in the same zone is exceptional.
There are two types of preconditions for unsafe acts: those that relate to the human operator's internal state and those that relate to the human operator's practices or ways of working. Adverse internal states include those related to physiology (e.g., illness) and mental state (e.g., mentally fatigued, distracted). A third aspect of 'internal state' is really a mismatch between the operator's ability and the task demands; for example, the operator may be unable to make visual judgments or react quickly enough to support the task at hand. Poor operator practices are another type of precondition for unsafe acts. These include poor crew resource management (issues such as leadership and communication) and poor personal readiness practices (e.g., violating the crew rest requirements in aviation).
Four types of unsafe supervision are: inadequate supervision; planned inappropriate operations; failure to correct a known problem; and supervisory violations.
Organizational influences include those related to resources management (e.g., inadequate human or financial resources), organizational climate (structures, policies, and culture), and organizational processes (such as procedures, schedules, oversight).
Some researchers have argued that the dichotomy of human actions as "correct" or "incorrect" is a harmful oversimplification of a complex phenomena.[23][24] A focus on the variability of human performance and how human operators (and organizations) can manage that variability may be a more fruitful approach. Newer approaches such as resilience engineering mentioned above, highlights the positive roles that humans can play in complex systems. In resilience engineering, successes (things that go right) and failures (things that go wrong) are seen as having the same basis, namely human performance variability. A specific account of that is the efficiency–thoroughness trade-off (ETTO) principle,[25] which can be found on all levels of human activity, in individual as well as collective.